N91-22320 


Serial and Parallel Computation of Kane's 
Equations for Multibody Dynamics 


by 

Amir Fijany 

Jet Propulsion Laboratory , California Institute of Technology 





INTENTIONALLY RAM 



o 


PRECEDING page 


blank not filmed 


Objective: 


Analysis of the Efficiency of Algorithms resulting from 
Kane's Equation for Serial and Parallel Computation of Mass 
Matrix. 


Overview: 

* Algorithms resulting from Kane's Equation and Modified 
Kane's Equation. 


* Analysis of two Classes of Algorithms for Computation of 
Mass Matrix: The Newton-Euler Based Algorithms and the 
Composite Rigid-Body Algorithms. 


* Analysis of the Efficiency of Different Algorithms for 
Serial and Parallel Computation. 


* Conclusion 
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Multibody Dynamics 


Case Study: Rigid Multibody as Specialized to a Single Chain 
Robot Manipulator. 


Multibody Dynamics: Solution for Q from 

AQ=r-b =r o) 

A: nxn Symmetric Positive definite Mass Matrix 
Q: nxl Vector of Generalized Accelerations 
Z: nxl Vector of Applied Forces/Torques 
b: nxl Vector of nonlinear Terms (Bias vector) 
f 1 : nxl Vector of Applied Inertia Forces/Torques 


The 0(n 3 ) Algorithms for Multibody Dynamics: 

1) Computation of b andf 1 - 

2) Computation of Mass Matrix A. 

3) Solution of Eq. (1) by Inversion of A. 


Kane's Equation is widely used for Computation of Mass 
Matrix. 
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Kane's Method: Notation 


Q: nxl Vector of Generalized Coordinates 


U: nxl Vector of Generalized Speeds 


n 

Choice of U: U = V AQ+B 

1 L i j j i 
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Angular and Linear Velocity of Body (Link) i 
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w : Angular Velocity of Body i 

( j : jth Partial Angular Velocity of Body i 
w : Angular Velocity of Remainder Terms 

1 v ) 

* 

V : Linear Velocity of Center of Mass of Body i 

* 

V : jth Partial Linear Velocity of Center of Mass 

i ( j) 

of Body i 

* 

V : Linear Velocity of Center of Mass of Body i 


Remainder Terms 
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Kane's Method: Notation 


Partial Angular and Linear Momentum 


N = I 0) 
— 1 ( j ) 


F = m V 
-Kj) l-i(j) 


N jth Partial Angular Momentum of Body i 

: jth Partial Linear Momentum of Body i 


Kane's Equation for Computation of Mass Matrix 


The element a of Mass Matrix A is Computed as 


ij 
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t = Y v .F + w .N 
ij L ~k(i) — k ( j) — k ( i ) -k(j) 
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k ( i ) k“k(J) — k ( i ) =k~k(j) 
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Kane's Equation: Analysis of General Case 


For Analysis of the General Case, We Set U 

u) = Z and w =0 

i ( j ) — j i ( t ) 


M. = ) Z Q. 

1 L . . J J 


j=l 

V = (Z x P ) and V 

i ( j ) j i*, j i (t) 


= 0 


V* = V (Z x P )Q 
-i L. , -j -I*,J J 
j=l 


N = I a) = I Z 

— i ( j ) — i — i ( j ) =i“j 


F = m (Z x P ) 

i ( j) i j i*, j 


Kane’s Equation can be written as 



).m (Z x P ) + Z . I Z 

k — j k* , j “i =k“j 
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AN 0(n 3 ) Algorithm Based on Kane’s Equation 


For i = 1, 2, n 

For j = i, i+1, . . • , n 


a 

i j 


n 

-i 


k=j 


(Z x 

'i 
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) . m (Z x P 
k ~j — k* 


) + Z . I Z 

-i =k— j 


This Algorithm is Designated as Original Kane’s Equation (OKE) 
Algorithm. 
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Modified Kane's Equation 
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a = 

i j 


Y CZ.x P ) . m (Z x P ) + Z .1 Z 
^ k=J J k*, j k -x ~k*, i -j =k~i 


n 

= Y 2 . (P x (m Z.x P ) + Z . I Z 

, j k*,j k i — k*, i -j =k~i 


k=j 
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= Y 2 . ( (P X (m Z x P .) + I z ) 
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AN 0(n 2 ) Algorithm Based on Kane’s Equation 


For i = 1, 2, ...» n 

For j = i, i+1, . . . , n 
P = P + P 


~ j, i 

— J-l, i 

“j, j-l 

P 

= P + 
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— J*,i 

“j, i 

— j 
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= I z 


— j ( i ) 

=j-i 
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For j = n, n-1 , ...» i 

f = F + f 

— j(i) — j ( i ) -j + l(i) 


S x F 

-j — JCi) 


+ P x f 

”J+1.J j+l ( i ) 


a 

ij 


= Z . n 
~j ~ 


j(i) 


This Algorithm is Designated as Variant of Kane’s Equation (VKE) 


Algorithm. 
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Algorithms for Computation of Mass Matrix 



For the conditions given as 

Qj = 1 and Qj = Qj^j = 0 For j = 1, 2, .... n (3) 

Two physical interpretations of Eqs. (2) & (3) lead to two 
classes of algorithms for computation of mass matrix: 

1 . The Newton-Euler Based (N-E B) Algorithms. 

Underlying Physical Concept: Propagation of acceleration 
among rigidly connected bodies. 

The Variant of Kane's Equation (VKE) Algorithm belongs to 
this class. 

2. The Composite Rigid-Body (CRB) Algorithms. 

Underlying physical Concept: Propagation of force among 
rigidly connected bodies. 
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Algorithms for Computation of Mass Matrix 


Clearly, the two physical interpretations are the same. 

We have shown that the algorithms of the two classes can be 
transformed to one another. 

From an algorithmic point of view, the main difference 
between the algorithms of the two classes is the presence of 
a two-dimensional recursion in Composite Rigid-Body 
Algorithms. 

The main issue is to determine the best algorithm(s) for 
serial and parallel computation. 

The Original Kane's Equation Algorithm is the least efficient 
since its computational complexity is of 0(n 3 ). 

The computational complexity of both the Newton-Euler 
Based Algorithms and Composite Rigid-Body Algorithms is of 
0(n 2 ). However, the Composite Rigid-Body Algorithms, in 
general, are more efficient. 
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Algorithms for Computation of Mass Matrix 


There are four major redundancies in the Original 

Newton-Euler Based Algorithm which can be removed by: 

1) Optimizing the Newton-Euler Formulation for the 
conditions given in Eq. (3), 

2) Using a variant of Newton-Euler Formulation , 

3) Choosing a better coordinate frame for projection of 
equations. 

4) Introducing a two-dimensional recursion in the 
computation which transforms it to an equivalent 
Composite Rigid-Body Algorithm. 
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A Variant of Newton-Euler Based Algorithm 


Step 1: 

For j = 1 , 2, . . . , n 

For i j> j+1, • • • » ^ 
o>(i, j) = Z( j) 

V(i, j) = V(i-l,j) + u(i, j)xP(i,i-l) 

_F(i+l, i, j) = m(i)V(i,j) + u>(i, j)xh(i) 

N(i+l,i,j) = k_(i)w(i, j) 

Step 2: 

For i = n, n-1 , . . . , j 

2^(n+l,n+l, j) = N(n+l,n+l, j) = 0 
F(n+l,i,j) = F(i+l,i,j) + F(n+1, i+1, j) 

N(n+l,i,j) =N(i+l,i,j) + N(n+1, i+1, j) + 
P(i+l,i)xF(n+l,i+l, j) 

= Z(i).N(n+l, i+1, j) 

This algorithm results from removing the first two redundancies 

2 

of the 0 N-E B Algorithm. It is clearly equivalent to the 0(n ) 
algorithm resulting from the Kane’s Equation or the Variant of 
Kane’s Equation (VKA) Algorithm. iQ2 



m(i) 
h(i ) 
k(i) 
Z(i) 

PCi, j) 

w(i, j) 
V(i, j) 
F(k+1, i. 


N(k+1, i, j 


Mass of body i. 

First moment of mass of body i about point 0 

i 

Second moment of mass of body i about point 0 

i 

Axis of joint i 

Position vector from point j to point i. 

Angular acceleration of body i resulting from the 
unit acceleration of joint j. 

Linear acceleration of body i (point 0.) 
resulting from the unit acceleration of joint j. 

) Force exerted on point CL due to the acceleration 
of bodies i through k, i.e. , the bodies contained 
between points CL and 0 , resulting from the 

unit acceleration of joint j. 

) Moment exerted on point CL due to the acceleration 

of body i through k, resulting from the unit 
acceleration of joint j. 
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A Variant of Composite Rigid-Body Algorithm 


Step 1: 

For i = n, n-1 , ...» 1 
M(i ) = m(i) + M( i+1 ) 

H(i ) = h(i) + _H( i+1 ) + M(i+1 )P(i+l, i) 

A A. 

K(i ) = k(i) + K(i+1) -M( i+1 )P(i+l, i)|(i+l, i) 

A A A A 

|(i+l, i)H(i+l) - H(i+l)P(i+l,i) 

f(i) = Z(i)xHCi) 

n(i) = K(i)Z(i) 

a = Z(i).n(i) 
ii — — 

Step 2: 

For j = i-1, i-2, . . . , 1 

f(j) = fCj+1) 

n(j) = n_(j+l) + P( j+1, j)xf ( j+1) 
a = Z( j) . n( j) 

ji — — 
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M(i) 
H(i ) 
K(i ) 


Mass of composite rigid-body i composed of bodies i 
through n. 

First moment of mass of composite rigid-body i about 
point Ch. 

Second moment of mass of composite rigid-body i 
about point Ch. 
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Comparison of Serial Efficiency of Different 

Algorithms 


In order to study the relative efficiency of the algorithms, 
the optimal choice of coordinate frame(s) for projection of 
the Equations should be carefully analyzed. 


For the Variant of Newton-Euler Algorithm, projection of all 
equations onto any fixed frame leads to maximum 
computational efficiency; It requires 0(n) transformations. 
Projection onto the body frame leads to copmputational 
inefficiency; it requires 0(n 2 ) transformations! 

For the Variant of Composite Rigid-Body Algorithm, 
projection of Step 1 onto body frame and Step 2 onto any 
fixed frame leads to maximum computational efficiency; It 
requires 0(n) transformations. 

Projection of both steps onto the body frame leads to 
copmputational inefficiency; it requires 0(n 2 ) 
transformations! 
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Comparison of Serial Efficiency of Different 

Algorithms 


Redundancy 

► 

®- ®- ® © 0 

V C R-B 0 C R-B V N-E B 0 N-E B OKA 

Serial Efficiency 

•4 

OKEA: Original Kane’s Equation Algorithm. 

0 N-E B: Original Newton-Euler Based Algorithm. 

V N-E B: Variant of Newton-Euler Based Algorithm. 

0 C R-B: Original Composite Rigid-Body Algorithm. 

V C R-B: Variant of Composite Rigid-Body Algorithm. 


Algorithm 

Mul. 

General 

Add. 

Mul. 

n = 6 
Add. 

Total 

V N-E B 

(39/2)n 2 + 

(195/2)n-95 

19n 2 + 

55n-66 

1192 

948 

2140 

V C R-B 

(9/2 )n 2 + 
(231/2)n-181 

4n 2 + 

88n-137 

644 

535 

1179 


I9Q 






(b) 


Computational Structure of and Data-Dependency in Algorithms for 

Mass Matrix 

a) The Newton-Euler Based Algorithms 
b) The Composite Rigid-Body Algorithms 
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Algorithmic Choice for Parallel Computation of 

Mass Matrix 


Parallelism in Computation of Mass Matrix: Time and 
Processors Bouds 

We have shown that the time lower bound in computation of 
mass matrix is of 0(log 2 n) and can be achieved by using 0(n 2 ) 
processors. 

The Original Kane's Equation Algorithm might seem very 
suitable for parallel computation since all elements of the 
mass matrix can be computed totally in parallel. 

The computation of each element of mass matrix can be 
performed in 0(log 2 n) steps by using O(n) processors. Hence, 
in order compute all the elements in parallel and achieve the 
time lower bound of 0(log 2 n) , 0(n 3 ) processors are required! 


Using both the Newton-Euler Based Algorithms and the 
Composite Rigid-Body Algorithms, the mass matrix can be 
computed in 0(log 2 n) steps with only 0(n 2 ) processors. 
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Algorithmic Choice for Parallel Computation of 

Mass Matrix 


The Newton-Euler Based Algorithms are more suitable for 
parallel computation due to their regular computational 
structure and a lesser degree of data-dependency in their 
computation. 

1) They provide a high degree of coarse grain parallelism: 
The columns of the mass matrix can be computed in 
parallel. 

2) They are more regular and have a finer grain: 

A higher degree of parallelism in computation of the 
elements of each column can be exploited 

3) Their parallel computation on a two-dimensional 
processor array requires simpler communication and 
synchronization mechanisms. 


Choice of Coordinate Frame for Parallel Computation on a 
two-dimensional processor array: 

For the Variant of Newton-Euler Based Algorithm it is more 
efficient to project the equations of onto the End-effector 
(Body n) frame while for the Variant of Composite Rigid-Body 
Algorithm it is more efficient to project the equations onto 
the base frame! 
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Conclusion 


* For recursive serial computation, the Variant of Composite 
Rigid-Body Algorithm is significantly more efficient than 
the Variant of Newton-Euler and the Variant of Kane's 
Equation Algorithms. 


* For parallel computation with 0(n 2 ) processors, i.e., 
maximum exploitation of parallelism, the Variant of 
Newton-Euler and the Variant of Kane's Equation Algorithms 
are not only significantly more efficient than the Variant of 
Composite Rigid-Body Algorithm but they also require 
much simpler architectural features. 


* For parallel computation with 0(n) processors, i.e., limited 
exploitation of parallelism, the Variant of Composite 
Rigid-Body Algorithm is more efficient than the Variant of 
Newton-Euler and the Variant of Kane's Equation Algorithms 
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Comparison of Two Classes of Serial and Parallel 
Algorithms for Computation of Mass Matrix 


Algorithm 

Computation Cost 

SP 

Proc. 

SA 

VCR-B 

General n = 6 

( ( 9/2 ) m+4a ) ) n 2 + 644m+535a 

( (231/2)m+88a) )n- 

(181m+137a) 

- 

1 

VN-EB 

( (39/2) 19a) n 2 + 1192m+948a 

( (195/2)m+55a)n- 
( 95m+66a ) 


1 

PA 

VCR-B 

(48m+63a) [log^n] + 244m+254a 

(100m+65a) 

2. 40 

1 

n(n+l)/2 

j 

VN-EB 

(33m+33a) [log 2 n"| + 208zn+188a 

(109m+89a) 

2. 98 

n(n+l )/2 

PPA 

VCR-B 

(9m+8a)n+ (48m+63a) |"log 2 n"] + 256/n+261a 

(58m+24a) 

2. 32 

1 

n j 

| 

VN-EB 

(39m+38a)n+ (27m+18a) [’log 2 n] + 340m+280a 
(25m-2a) 

1. 90 

i 

n 


SA: Serial Algorithm. 

2 

PA: Parallel Algorithm with 0(n ) processors. 


PPA: Parallel Algorithm with 0(n) processors. 
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Parallel VNEB algorithm 


Step 1: 

1) Parallel compute H(j+1, j) by all processors of Row j. 

For J — 1. 2 n 

For 1 = 1, 2 j 

PR Ji : R( j+1, j ) 

2) Parallel compute R(n+1, j) by processors of Column i. 

For i = 1 , 2, . . . , n 

For j = i, i+1 n 

For 7) = 1 step 1 until riog^n+l-i)] , Do 
H(j+2 7) , j) = R(n+1, j) 

j+2 7} >j+2 1?_1 2:n+l 

RCj+2 77 , j) = R(n+1, j) = R(n+1, j+2 7J-1 )R(j+2 7 >~\ j) 

j+2 7, 2=n+l> j+2 7,_1 

RCj+2 1 *, j) = R(j+2\ j+2 7) " 1 )R(j+2 7,_1 , j) 

n+l>j+2 7) >j+2 7,_1 

End_Do 

3) Shift R(n+1, j+1) by processors of Row j+1 to the processors of 
Row j. 

For j = 1, 2 n 

For i = 1, 2, . . . , j 
PRj t : R(n+1, j+1 ) 

with R(n+1 , n+1 ) = U (Unit Matrix) 
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4) Parallel compute n+1 Z(j), n+1 P(j+l,j), and n+1 H(j) by all 
processors of Row j. 

For j = 1, 2 n 

For i = 1 , 2, .... j 

a) PR : n+1 Z(j) = R(n+1, j) J Z(j) 
with ^Z(j) =[001] 

b) PR : n+1 P(j + l,j) = R(n+1, j+l) J+1 P(j+l, j) 

c) PR : n+1 S(j) = R(n+1, j+l) J+1 S(j) 

d) PR jt : n+1 H(j) = H(j) n+1 S(j) 

Step 2: 

1) Parallel compute P(j+l,i) and w(j,i) by processors of Column i. 
For i = 1 , 2, .... n 
For j = i, i+1, ■ • • , n 

For T7 = 1 step 1 until [log 2 (n+l-i )] , Do 
w(j+2 7} ,i) = u(j+2 7) "\i) = Z(i) 

P(j+2 7) , j) = P( j+1, i) 

j+2 7) >j+2 7, " 1 in+l 

PCj+2 11 , j) = P( j+1, i) = P(j+2 7) , j+2 7) ‘ 1 )+P(j+2 7) " 1 , j) 

j+2 7) 2:n+l> j+2 7>_1 

PCj+2 77 , j) = P(j+2 7} , j+2 7) ' 1 )+P(j+2 7) “ 1 , j) 

n+l>j+2 7, >j+2 7)_1 

End_Do 
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2) Parallel compute V(j,i), F(j+l,j,i), and N(j+l,j,i) by 
processors of Column i. 

For i = 1, 2, . . . , n 
For j = i, i+1 , . . . n 

a) PR : V( j, i ) = u( j, i)xP( j+1, i) = 2(i)xP( j+1, i) 

b) PR : F( j + 1, j, i ) = w( j, i)xH( j)+M( j)V( j, i) 

c) PH : N (j + 1, j, i ) = R(n+1, j + 1) f j+1 K( j )R( j+1 , n+1 ) n+1 a>( j , 1)1 + 

J V > 

H( j )xV( j, i ) 

Step 3: 

1) Parallel compute F(n+l,j,i) processors of Column i. 

For i = 1, 2, . . . , n 

For j = i, i + 1 n 

For 7) = 1 step 1 until [log 2 (n+1- j )] , Do 
FCj+2 77 , j,i) = F (n+1 , j , i ) 

j+2 7) >j+2 7,_1 in+l 

FC j+2 77 , j, i ) = F (n+1 , j , i ) = F(j+2 7) , j+2 1 *" 1 , i)+F( j+2 7 *" 1 , j,i) 

j+2 1) >n+l>j+2 1r) “ 1 
F(j+2 77 ,j,i) = F( j+2 17 , j+2 7, ' 1 ,i)+F(J+2 7l “ 1 , j,i) 

n+l>j+2 T) >j+2 7) " 1 

End Do 
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2) Shift F(n+1 , j+1, i ) by processors of Row j+1 to processors of 
Row j. 

For j = 1, 2, .... n 
For i = 1, 2, . . . , j 
PR : F(n+1, j+1, i) 

3) Parallel compute N(n+l,j,i) by processors of Column i. 

For i = 1 , 2, . . . , n 

For j = i, i+1, . . . , n 

a) PR : N( j+1, j, i) = N( j+1, j, i)+P( j+1, j)xF(n+l, j+1, i) 

b) For t) = 1 step 1 until [log (n+1- j )] , Do 

N(j+2 77 , j,i) = N(n+1, j, i) 

j+2 7) >j+2 7?_1 ^n+l 

N(j+2 7? ,j,i) = N(n+1, j, i) = N(n+1, j+2 T| ~ 1 , i)+N( j+2 17 " 1 , j, i) 

j+2 7 Wn>j+2 7rl 

N(j+2 7] ,j,i) = N(j+2 7) , j+2 7 *" 1 , i)+N(j+2 7} "\ j,i) 

n+l>j+2 7J >j+2 7,_1 

End_Do 

2) Parallel compute a by PR . 

* ji Ji 

For i = 1, 2, . . . , n 

For j = i, i + 1, . . . n 

PR : a = Z( j).N(n+l, j, i) 
ji ji 
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Algori thm- To- A rchi tecture Mappin g 


Determination of an Algorithmically-Speciaslized Parallel 
Architecture for Efficient Implementation of the Algorithm. 

1) Processors Interconnection and Communication 
Complexity 

For perfect mapping: 

a) The required interconnection among processors of each 
column is Shuffle Exchange augmented with Nearest-Neighbor 
(SENN). 

b) The required interconnection among processors of each 
row is Nearest-Neighbor. 

The perfect mapping leads to a communication complexity of 
0(log 2 n). Mapping on an array with nearest-neighbor 

interconnection leads to the communication complexity of 
0(n). 

2) Synchronization Mechanism 

Exploitation of parallelism at two computational levels: 

a) Coarse grain parallelism in computing columns of mass 
matrix, and 

b) Fine grain parallelism in computing the elements of each 
column. 

Global Clock-Based Synchronization Mechanism (similar to 
Systolic Array) for processors of each column, and Local Data 
Driven (similar to Wavefront Array) for processor of each 
row. 
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